GRAB - Inverted Indexes with Low Storage Overhead
نویسنده
چکیده
A searching command (grab) for maintaining indexes combines acceptably fast searching with very low storage overhead. It looks like grep except that it demands a preindexing pass, looks only for whole words, and runs faster. As an example of performance, consider the time to search for single words in a 7.8 Mbyte file (the Brown corpus of English). The times below are in seconds on a DEC 8600 running Ultrix; the space overhead is given as a percentage ofthe original file. word No. uses tt3?HPu* t,,1"*t0""* ttåJiil.. Shakespeare Dickens Chaucer 29
منابع مشابه
Building Space-Efficient Inverted Indexes on Low-Cardinality Dimensions
Many modern applications naturally lead to the implementation of inverted indexes for effectively managing large collections of data items. Creating an inverted index on a low cardinality data domain results in replication of data descriptors, leading to increased storage overhead. For example, the use of RFID or similar sensing devices in supply-chains results in massive tracking datasets that...
متن کاملEfficient Phrase Querying with an Auxiliary Index
Search engines need to evaluate queries extremely fast, a challenging task given the vast quantities of data being indexed. A significant proportion of the queries posed to search engines involve phrases. In this paper we consider how phrase queries can be efficiently supported with low disk overheads. Previous research has shown that phrase queries can be rapidly evaluated using nextword index...
متن کاملEfficient Phrase Querying with an Auxiliary Index
Search engines need to evaluate queries extremely fast, a challenging task given the vast quantities of data being indexed. A significant proportion of the queries posed to search engines involve phrases. In this paper we consider how phrase queries can be efficiently supported with low disk overheads. Previous research has shown that phrase queries can be rapidly evaluated using nextword index...
متن کاملTaming Hot-Spots in DHT Inverted Indexes
DHT systems are structured overlay networks capable of using P2P resources as a scalable platform for very large data storage applications. However, their efficiency expects a level of uniformity in the association of data to index keys that is often not present in inverted indexes. Index data tends to follow nonuniform distributions, often power law distributions, creating intense local storag...
متن کاملEfficient Query Processing on Term-Based-Partitioned Inverted Indexes
In a shared-nothing, parallel text retrieval system, queries are processed over an inverted index that is partitioned among a number of index servers. In practice, the inverted index is either document-based or term-based partitioned, depending on properties of the underlying hardware infrastructure, query traffic, and some performance and availability constraints. In query processing on term-b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computing Systems
دوره 1 شماره
صفحات -
تاریخ انتشار 1988